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(57) Abstract 

A method, an apparatus and a computer program product for adaptive, content-based watermark embedding of a digital audio signal 
(100) are disclosed. Corresponding watermark extracting techniques are also disclosed. Watermark information (102) is encrypted (120) 
using an audio digest signal, i.e, a watermark key (108). To optimally balance inaudibility and robustness when embedding and extracting 
watermarks (450). the original audio signal (100) is divided into fixed-length frames (1 100. 1 120, 1 130) in the lime domain. Echoes (S*[n]. 
S"(n]) are embedded in the original audio signal (ICX)) to represent the watermark (450). The watermark (450) is generated by delaying 
and scaling the original audio signal (100) and embedding it in the audio signal (100). An embedding scheme (104) is designed for each 
frame (1100. 1120. 1130) according to its properties in the frequency domain. Finally, a multiple-echo hopping module (160) is used 
to embed and extract watermarks in the frame (1 100. 1120. 1130) of the audio signal (100). An audio watermarking system known as 
KcntMark(Audio) is implemented. 
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Digital Audio Watermarking Using Content-Adaptive, Multiple Echo 

Hopping 



FIELD OF THE mVENTION 

5 The present invention relates to the field of digital audio signal processing, and in 
particular to techniques of watermarking a digital audio signal. 

BACKGROUND 

The recent growth of networked multimedia systems has significantly increased the 
need for the protection of digital media. This is particularly important for the 

10 protection and enhancement of intellectual property rights. Digital media includes 
text, software, and digital audio, video and images. The ubiquity of digital media 
available via the Internet and digital library applications has increased the need for 
new techniques of digital copyright protection and new measures in data security. 
Digital watermarking is a developing technology that attempts to address these 

15 growing concerns. It has become an area of active research in multimedia 
technology. 

A digital watermark is an invisible structure that is embedded in a host media signal. 
Therefore, watermarking, or data hiding, refers to techniques for embedding such a 
structure in digital data. It is an application that embeds the least amount of data, but 

20 contrarily requires the greatest robustness. To be effective, a watermark should be 
inaudible or invisible within its host signal. Further, it should be difficult or 
impossible to remove by unauthorised access, yet be easily extracted by the owner or 
authorised person. Finally, it should be robust to incidental and/or intentional 
distortions, including various types of signal processing and geometric transformation 

25 operations. 

Many watermarking techniques have been proposed for text, images and video. They 
mainly focus on the invisibility of the watermark and its robustness against various 
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signal manipulations and hostile attacks. These techniques can be grouped into two 
categories: spatial domain methods and frequency domain methods. 

In relation to text, image and video data, there is a current trend towards approaches 
that make use of information about the human visual system (HVS) in an attempt to 
5 produce a more robust watermark. Such techniques use explicit information about the 
HVS to exploit the limited dynamic range of the human eye. 

Compared with the development of digital video and image watermarking techniques, 
watemiarking digital audio provides special challenges. The human auditory system 
(HAS) is significantly more sensitive than HVS. In particular, the HAS is sensitive to 
10 a dynamic range for amplitude of one billion to one and for frequency of one 

thousand to one. Sensitivity to additive random noise is also acute. Perturbations in 
a sound file can be detected as low as one part in ten million (80dB below ambient 
level). 

Generally, the limit of perceptible noise increases as the noise content of a host audio 
15 signal increases. Thus, the typical allowable noise level remains very low. 

Therefore, there is clearly a need for a system of watermarking digital audio data that 
is inaudible and robust at the same time. 

SUMMARY 

In accordance with a first aspect of the invention, there is disclosed a method of 
20 embedding a watermark in a digital audio signal. The method includes the step of: 

embedding at least one echo dependent upon the watermark in a portion of the digital 
audio signal, predefined characteristics of the at least one echo being dependent upon 
time and/or frequency domain characteristics of the portion of the digital audio signal 
to provide a substantially inaudible and robust embedded watermark in the digital 
25 audio signal. 

Preferably, the method includes the step of digesting the digital audio signal to 
provide a watermark key, the watermark being dependent upon the watermark key. It 
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may also include the step of encrypting predetermined information using the 
watermark key to form the watermark. 

Preferably, the method includes the step of generating the at least one echo to have a 
delay and an amplitude relative to the digital audio signal that is substantially 
5 inaudible. The value of the delay and the amplitude are programmable. 

Two or more echoes can be programmably sequenced having different delays and/or 
amplitudes. Two portions of the digital audio signal can be embedded with different 
echoes dependent upon the time and/or frequency characteristics of the digital audio 
signal. 

10 In accordance with a second aspect of the invention, there is disclosed an apparatus 

for embedding a watermark in a digital audio signal. The apparatus includes: a device 
for determining time and/or frequency domain characteristics of the digital audio 
signal; and a device for embedding at least one echo dependent upon the watermark 
in a portion of the digital audio signal, predefined characteristics of the at least one 

15 echo being dependent upon the time and/or frequency domain characteristics of the 
portion of the digital audio signal to provide a substantially inaudible and robust 
embedded watermark in the digital audio signal. 

In accordance with a third aspect of the invention, there is disclosed a computer 
program product having a computer readable medium having a computer program 

20 recorded therein for embedding a watermark in a digital audio signal. The computer 
program product includes: a module for determining time and/or frequency domain 
characteristics of the digital audio signal; and a module for embedding at least one 
echo dependent upon the watermark in a portion of the digital audio signal, predefined 
characteristics of the at least one echo being dependent upon the time and/or 

25 frequency domain characteristics of the portion of the digital audio signal to provide a 
substantially inaudible and robust embedded watermark in the digital audio signal. 

In accordance with a fourth aspect of the invention, there is disclosed a method of 
embedding a watermark in a digital audio signal. The method includes the steps of: 
generating a digital watermark; adaptively segmenting the digital audio signal 
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dependent upon at least one frequency and/or time domain characteristic into two or 
more frames containing respective portions of the digital audio signal; classifying 
each frame dependent upon at least one frequency and/or time domain characteristic 
of the portion of the digital audio signal in the frame; and embedding at least one echo 
5 in at least one of the frames, the echo being dependent upon the watermark and upon 
a classification of each frame determined by the classifying step, whereby a 
watermarked digital audio signal is produced. 

Preferably, the watermark is dependent upon the digital audio signal. The method 
may also include the steps of: audio digesting the digital audio signal to provide an 
1 0 audio digest; and encrypting watermark information dependent upon the audio digest. 

Preferably, the method further includes the step of extracting one or more features 
from each frame of the digital audio signal. It may also include the step of selecting 
an embedding scheme for each frame dependent upon the classification of each frame, 
the embedding scheme adapted dependent upon at least one time and/or frequency 

15 domain characteristic of the classification for the corresponding portion of the digital 
audio signal. Still fiirther, the method may further include the step of embedding the 
at least one echo in at least one of the frames dependent upon the selected embedding 
scheme. The ampUtude and the delay of the echo relative to the corresponding 
portion of the digital audio signal in the frame is defined dependent upon the 

20 embedding scheme so as to be inaudible. Optionally, at least two echoes are 
embedded in the frame. 

Preferably, two or more echoes embedded in the digital audio signal are dependent 
upon a bit of the watermark. 

In accordance with a fifth aspect of the invention, there is disclosed an apparatus for 
25 embedding a watermark in a digital audio signal. The apparatus includes: a device for 
generating a digital watermark; a device for adaptively segmenting the digital audio 
signal dependent upon at least one frequency and/or time domain characteristic into 
two or more frames containing respective portions of the digital audio signal; a device 
for classifying each frame dependent upon at least one frequency and/or time domain 
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characteristic of the portion of the digital audio signal in the frame; and a device for 
embedding at least one echo in at least one of the frames, the echo being dependent 
upon the watermark and upon a classification of each frame determined by the 
classifying device, whereby a watermarked digital audio signal is produced. 

5 In accordance with a sixth aspect of the invention, there is disclosed a computer 
program product having a computer readable medium having a computer program 
recorded therein for embedding a watermark in a digital audio signal. The computer 
program product includes: a module for generating a digital watermark; a module for 
adaptively segmenting the digital audio signal dependent upon at least one frequency 

10 and/or time domain characteristic into two or more frames containing respective 
portions of the digital audio signal; a module for classifying each frame dependent 
upon at least one frequency and/or time domain characteristic of the portion of the 
digital audio signal in the frame; and a module for embedding at least one echo in at 
least one of the frames, the echo being dependent upon the watermark and upon a 

1 5 classification of each frame determined by the classifying device, whereby a 
watermarked digital audio signal is produced. 

In accordance with a seventh aspect of the invention, there is disclosed a method of 
extracting a watermark from a watermarked digital audio signal. The method 
includes the steps of: adaptively segmenting the watermarked digital audio signal into 
20 two or more frames containing corresponding portions of the watermarked digital 

audio signal; detecting at least one echo present in the frames; and code mapping the 
at least one detected echo to extract an embedded watermark, the mapping being 
dependent upon one or more embedding schemes used to embed the at least one echo 
in the watermarked digital audio signal. 

25 Preferably, the method further includes the step of audio registering the watermarked 
digital audio signal with the original digital audio signal to determine any 
unauthorised modifications of the watermarked digital audio signal. 
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Preferably, the method further includes the step of decrypting the embedded 
watermark dependent upon an audio digest signal to derive watermark information, 
the audio digest signal being dependent upon an original digital audio signal. 

In accordance with an eighth aspect of the invention, there is disclosed an apparatus 
5 for extracting a watermark from a watermarked digital audio signal. The apparatus 
includes: a device for adaptively segmenting the watermarked digital audio signal into 
two or more frames containing corresponding portions of the watermarked digital 
audio signal; a device for detecting at least one echo present in the frames; and a 
device for code mapping the at least one detected echo to extract an embedded 
10 watermark, the mapping being dependent upon one or more embedding schemes used 
to embed the at least one echo in the watermarked digital audio signal. 

In accordance with an ninth aspect of the invention, there is disclosed a computer 
program product having a computer readable medium having a computer program 
recorded therein for extracting a watermark from a watermarked digital audio signal, 

1 5 The computer program produa includes: a module for adaptively segmenting the 
watermarked digital audio signal into two or more frames containing corresponding 
portions of the watermarked digital audio signal, a module for detecting at least one 
echo present in the frames; and a module for code mapping the at least one detected 
echo to extract an embedded watermark, the mapping being dependent upon one or 

20 more embedding schemes used to embed the at least one echo in the watermarked 
digital audio signal. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A small number of ernbodiments of the invention are described hereinafter with 
reference to the drawings, in which: 

25 Fig. 1 is a high-level block diagram illustrating the watermark embedding process in 
accordance with a first embodiment of the invention. 

Fig. 2 is a flowchart illustrating the echo hopping process of Fig. 1; 

Fig. 3 is a flowchart illustrating the echo embedding process of Fig. 1, 
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Fig. 4 is a block diagram illustrating the watermark extracting process of Fig. 1 ; 

Fig. 5 is a flowchart illustrating the echo detectmg process of Fig. 4; 

Fig. 6 is a block diagram depicting the relationship of encryption and decryption 
process shown in Figs. 1 and 4, respectively; 

5 Fig. 7 is a flowchart of the audio digesting process for generating a watermark key 
shown in Fig. 1 ; 

Fig. 8 is a block diagram illustrating a training process to produce classification 
parameters and embedding scheme design for audio samples; 

Fig. 9 is a flowchart illustrating the audio registration process of Fig. 4; 

10 Fig. 10 is a graphical depiction of frequency characteristics; 

Figs. 1 1 A- 11 D are timing diagrams illustrating the process of embedding echoes in a 
digital audio signal to produce a watermarked audio signal; and 

Fig. 12 is a diagram illustrating the spectra corresponding to a frame of the original 
audio signal shown in Fig. 1 1 A. 

1 5 DETAILED DESCRIPTION 

A method, an apparatus and a computer program product for embedding a watermark 
in a digital audio signal are described. Correspondingly, a method, an apparatus and a 
computer program product for extracting a watermark from a watermarked audio 
signal are also described. In the following description, numerous specific details are 
20 set forth including specific encryption techniques to provide a more thorough 

description of the embodiments of the present invention. It will be apparent to one 
skilled in the art, however, that the present invention may be practised without these 
specific details. In other instances, well-knowoi features are not described in detail so 
as not to obscure the present invention. 
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Four accompanying Appendices (1 to 4) form part of this description of the 
embodiments of the invention. 

The embodiments of the invention provide a solution to the conflicting requirements 
of inaudibility and robustness in embedding and extracting watermarks in digital 
5 audio signals. This is done using content-adaptive, digital audio watermarking 

While the HAS has a large dynamic range, it often has a fairly small differential 
range. Consequently, loud sounds tend to mask out quieter sounds. Additionally, 
while the HAS has very low sensitivity to the amplitude and relative phase of a sound, 
it is difficult to perceive absolute phase. Finally, there are some environmental 
10 distortions so common as to be ignored by the listener in most cases. These 

characteristics can be considered as positive factors to design watermark embedding 
and extracting schemes. 

Focusing on issues of inaudibility, robusmess and tamper-resistance, four techniques 
are disclosed hereinafter. They are: 
15 (1) content- adaptive embedding scheme modelling, 

(2) multiple-echo hopping and hiding, 

(3) audio registration using a Dynamic Time W^ing technique, and 

(4) watermark encryption and decryption using an audio digest signal. 
An application system called KentMark (Audio) is implemented based on these 

20 techniques. A brief overview of the four techniques employed by the embodiments of 
the present invention is set forth first. 

Content-Adaptive Embedding 

In the content-adaptive embedding technique, parameters for setting up the 
embedding process vary dependent on the content of an audio signal. For example, 
25 because the content of a frame of digital violin music is very different from that of a 
recording of a large symphony orchestra in terms of spectral details, these two 
respective music frames are treated differently. By doing so, the embedded 
watermark signal better matches the host audio signal so that the embedded signal is 
perceptually negligible. This content- adaptive method couples audio content with the 
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embedded watermark signal. Consequently, it is difficult to remove the embedded 
signal without destroying the host audio signal. Since the embedding parameters 
depend on the host audio signal, the tamper-resistance of this watermark embedding 
technique is also increased. 

5 In broad terms, this technique involves segmenting an audio signal into frames in the 
time domain, classifying the frames as belonging to one of several known classes, and 
then encoding each frame with an appropriate embedding scheme. The particular 
scheme chosen is tailored to the relevant class of audio signal according to its 
properties in the frequency domain. To implement the content-adaptive embedding, 
10 two techniques are disclosed. They are audio-frame classification and embedding- 
scheme design techniques. 

Multiple Echo Hopping and Hiding 

Essentially, the echo hiding technique embeds a watermark into a host audio signal by 
introducing an echo. The embedded watermark itself is a predefined binary code. A 

15 time delay of the echo in relation to the original audio signal encodes a binary bit of 
the code. Two time delays can be used. One delay is for a binary one, and another is 
for a binary zero. Both time delays are chosen to remain below a predefined threshold 
that the human ear can sense. Thus, most human beings cannot resolve the resulting 
embedded audio as deriving from different sources. In addition to decreasing the time 

20 delay, distortion must remain imperceptible. The echo's amplitude and its decay rate 
are set below the audible threshold of a typical human ear. 

To enhance the robustness and tamper-resistance of an embedded watermark, a 
multiple echo-hopping process can be employed. Instead of embedding one echo into 
an audio frame, multiple echoes with different time delays can be embedded into each 
25 audio sub-frame. In other words, a bit is encoded with multiple bits. Using the same 
detection rate, the amplitude of an echo can consequently be reduced. For attackers 
attempting to defeat the watermark, without knowledge of the parameters, this 
significantly reduces the possibility of unauthorised echo detection and removal of a 
watermark. 
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Audio Registration Using DTW Technique 

To prevent unauthorised attackers from re-scahng, inserting and/or deleting an audio 
signal in the time domain, a procedure is provided for registering an audio signal 
before watermark extraction. 

5 In the registration process, a Dynamic Time Warping (DTW) technique is employed. 
The DTW technique resolves an optimal aUgnment path between two audio signals. 
Both the audio signal under consideration and the reference audio signal are 
segmented into fixed-length frames. The power spectral parameters in each frame are 
then calculated using a non-linear frequency scale method. An optimal path is 
10 generated that results in the minimal dissimilarity between the reference audio and the 
testing audio frame sequences. The registration is performed according to this optimal 
path. Any possible shifting, scaling, or other non-linear time domain distortion can be 
detected and recovered. 

Watermark Encryption & Decryption Using Audio Digest Signal 

15 To further improve system security and tamper- resistance, an audio digest signal from 
the original audio signal is generated as a watermark key to encrypt and decrypt the 
watermark signal. This serves to guarantee the uniqueness of a watermark signal, and 
prevent unauthorised access to the watermark. 

1 Watermark Embedding 

20 Fig. 1 illustrates a process of embedding watermarks in accordance with a first 

embodiment of the invention. A digital audio signal 100 is provided as input to an 
audio digest module 130, an audio segmentation module 140, and an echo embedding 
module 180. Using the digital audio signal 100, the audio digest module 130 
produces a watermark key 108 that is provided as input to an encryption module 120. 

25 The watermark key 108 is an audio digest signal created from the original audio 

signal 100. It is also an output of the system. Predefined watermark information 102 
is also provided as an input to the encryption module 120. The watermark 
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information 102 is encrypted using the watermark key 108 and provided as input to an 
echo-hopping module 1 60. 

The audio segmentation module 140 segments the digital audio signal 100 into two or 
more segments or frames. The segmented audio signal is provided as input to a 
5 feature extraction module 150. Feature measures are extracted from each frame to 
represent the characteristics of the audio signal in that frame. An exemplary feature 
extraction method using a non-linear frequency scale technique is described in 
Appendix I . While a specific method is set forth, it will be apparent to one skilled in 
the art that, in view of the disclosure herein, that other techniques can be practised 
1 0 without departing from the scope and spirit of the invention. The feature extraction 
process is the same as the one used in the training process described hereinafter with 
reference to Fig. 4. 

The extracted features from each frame of digital audio data 100 are provided as input 
to the classification and embedding selection module 170. This module 170 also 
1 5 receives classification parameters 106 and embedding schemes 104 as input. The 

parameters of the classifier and the embedding schemes are generated in the training 
process. Based on the feature measures, each audio frame is classified into one of the 
pre-defined classes and an embedding scheme is select^. 

The output of the classification and embedding scheme selection module 170 is 
20 provided as an input to the echo-hopping module 160. Each embedding scheme is 
tailored to a class of the audio signal. Using the selected embedding scheme, the 
watermark is embedded into the audio frame using a multiple-echo hopping process. 
This produces a particular arrzuigement of echoes that are to be embedded in the 
digital audio signal 100 dependent upon the encrypted watermark produced by the 
25 module 120. The echo hopping sequence and the digital audio signal 100 are 
provided as an input to the echo embedding module 180. The echo embedding 
module 180 produces the watermarked audio signal 1 10 by embedding the echo 
hopping sequence into the digital audio signal 100. Thus, the watermark embedding 
process of Fig. 1 produces two outputs: a watermark key 108 digested from the 
30 original audio signal 100 and the final watermarked audio signal 1 10. 
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The foregoing embodiment of the invention and the corresponding watermark 
extraction process described hereinafter can be implemented in hardware or software 
form. That is, the fiinctionality of each module can be implemented electronically or 
as software that is carried out using a computer. For example, the embodiment can be 
5 implemented as a computer program product. A computer program for embedding a 
watermark in a digital audio signal can be stored on a computer readable medium. 
Likewise, the computer program can be one for extracting a watermark from a 
watermarked audio signal. In each case, the computer program can be read from the 
medium by a computer, which in turn carries out the operations of the computer 
1 0 program. In yet another embodiment, the system depicted in Fig. 1 can be 

implemented as an Application Specific Integrated Circuit (ASIC), for example. The 
watermark embedding and extracting processes are capable of being implemented in a 
number of other ways, which will be apparent to those skilled in the art in view of this 
disclosure, without depeuling from the scope and spirit of the invention. 

15 1.1 Echo Hopping 

Fig. 2 illustrates the ftmctionality of the echo-hopping module 160 of Fig. 1 in further 
detail. To gain robustness in any subsequent detection process carried out on a 
watermarked audio signal, multiple echo hopping is eifiployed. A bit in the 
watermark sequence is encoded as multiple echoes while each audio frame is divided 
20 into multiple sub-frames. Processing commences at step 200. In step 200, each frame 
of the digital audio signal is divided into multiple sub-frames. This may include two 
or more sub- frames. 

In step 210, the embedding scheme 104 selected by the module 170 of Fig. 1 is 
mapped into the sub-frames. In step 220, the sub-frames are encoded according to the 

25 embedding scheme selected. Each sub-frame carries one echo. For each echo, there 
is a set of parameters determined in the embedding scheme design. In this way, one 
bit of the watermark is encoded as multiple bits in various patterns. This significantly 
reduces the possibility of echo detection and removal by attackers, since the 
parameters corresponding to each echo are unknown to them. In addition, more 

30 patterns can be chosen when embedding a bit. Processing then terminates. 
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1.2 Echo Embedding 

Fig. 3 illustrates in further detail the functionality of the echo-embedding module 180 
for embedding an echo into the audio signal shown in Fig. 1. A sub-frame 300 is 
provided as input to step 310 to calculate the delay of the original audio signal 100. In 
5 step 320, a predetermined delay is added to a copy of the original digital audio signal 
in the sub-frame to produce a resulting echo. The amplitude of the time-delayed 
audio signal is also adjusted so that it is substantially inaudible. In this echo 
embedding process, an audio frame is segmented into fixed sub-frames. Each sub- 
frame is encoded with one echo. For the ith frame, the embedded audio signal 5- (n) 
10 is expressed as follows: 

S]j{n)^S,,{n)^a,jS,j{n~5,), ^ (1) 

S,^{^) = 0 ifk<Q, (2) 

where 5,^ {n) is the original audio signal of the jth sub-frame in the ith frame, a- is 
the amplitude scaling factor, and S-j is the time delay corresponding to either bit 'one' 
15 orbit 'zero'. 

Fig. 11 is a timing diagram illustrating this process. With reference to Fig. 1 1 A, a 
frame 1 100 of an original digital audio signal S[n] is shown. Preferably, the frames 
are fixed length. The amplitude of the signal S[n] is shown normalised within a scale 
of-1 to 1 . Dependent upon the content of the audio signal S[n], it is processed as a 
20 number of frames (only one of which is shown in Fig. 1 1). Fig. 12 depicts exemplary 
spectra for the frame 1 100. In turn, the representative frame 1 100 is processed as 
three sub-frames 11 10, 1 120, 1 130 with starting points nO, nl, and n2, respectively in 
this example. 

The first sub-frame 1 1 10 is embedded with an echo S'[n] shown in Fig. 1 IB. The 
25 sub-frame 1 1 10 starts at nO and ends before nl. The first echo S*[n] = al x S[n + 51]. 
The second sub-frame 1 120 is embedded with an echo S'*[n] shown in Fig 1 IC. The 
second echo S"[n] = a2 x S[n + 52]. Both scale factors al and a2 are significantly 
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less than the amplitude of the audio signal S[n]. Likewise the delays 51 and 62 are 
not detectable in the HAS. The resulting frame 1 100 of the watermarked audio signal 
S[n] + S'[n] -f- S"[n] is shown in Fig. 1 ID. The difference between frame 1 100 in 
Fig. 1 1 A and in Fig. IID is virtually undetectable to the HAS. 

5 2 Watermark Encryption and Decryption 

The relationship between encryption and decryption processes is shown in Fig. 6. 
Encryption 600 is a process of encoding a message or data, e.g. plain text 620, to 
produce a representation of the message that is unintelligible or difficult to decipher. 
It is conventional to refer to such a representation as cipher text 640, 

10 Decryption 610 is the inverse process to transform an encrypted message 640 back 
into its original form 620. Cipher text and plain text are merely naming conventions. 

Some form of encryption/decryption key 630 is used in both processes 600, 610. 

Formally, the transformations between plain text and cipher text are denoted 
C=E(K.P) and P^D(K,C), where C represents the cipher text, E is the encryption 
1 5 process, P is the plain text, D is the decryption process, and A' is a key to provide 
additional security. ^ 

Many forms of encryption and corresponding decryption are well known to those 
skilled in the art, which can be practised with the invention. These include LZW 
encryption, for example. 

20 2.1 Audio Digest 

Fig. 7 is a flow diagram depicting a process of generating an audio digest signal used 
as a security key to encrypt and decrypt watermark information to produce a 
watermark. The original audio signal 700 is provided as input to step 710, which 
performs a hash transform on the audio signal 700. In particular, a one-way hash 
25 function is employed. A hash function converts or transforms data to an "effectively" 
unique representation, normally much smaller in size. Different input values produce 
different output values. The transformation can be expressed as follows: 
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where S denotes the original audio signal, AT denotes the audio digest signal, and H 
denotes the one-way Hash function. 

In step 720, a watermark key is generated. The watermark key produced is therefore 
5 a shorter representation of the input digital audio data. Processing then terminates. 

3 Adaptive Embedding Scheme Modelling 

Modelling of the adaptive embedding process is an essential aspect of the 
embodiments of the invention. It includes two key parts: 

1 . Audio clustering and embedding process design (or training process, in other 
10 words); and 

2. Audio classification and embedding scheme selection. 

Fig. 8 depicts the training process for an adaptive embedding model. Adaptive 
embedding, or content-sensitive embedding, embeds watermarks differently for 
different types of audio signals. To do so, a training process is run for each category 
1 5 of audio signal to define embedding schemes that are well suited to the particular 

category or class of audio signal. The training process analyses an audio signal 800 to 
find an optimal way to classify audio frames into classes and then design embedding 
schemes for each of those classes. 

Training sample data 800 is provided as input to an audio segmentation module 810. 

20 The training data should be sufficient to be statistically significant. The segmented 
audio that results is provided as input to a feature extraction module 820 and the 
embedding scheme design module 840. A model of the human auditory system 
(HAS) 806 is also provided as input to the feature-extraction module 820, the feature- 
clustering module 830, and the embedding-scheme design module 840. Inaudibility 

25 or the sensitivity of human auditory system and resistance to attackers are taken into 
consideration. 
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The extracted features produced by module 820 are provided as input to the feature- 
clustering module 830. The feature-clustering module 830 produces the classification 
parameters 820 and provides input to the embedding-scheme design module 840. 
Audio signal frames are clustered into data clusters, each of which forms a partition in 
5 the feature vector space and has a centroid as its representation. Since the audio 
frames in a cluster are similar, embedding schemes are designed dependent on the 
centroid of the cluster and the human audio system model 806. The embedding- 
scheme design module 840 produces a number of embedding schemes 804 as output. 
Testing of the design of an embedding scheme is required to ensure inaudibility and 
10 robustness of the resulting watermark. Consequently, an embedding scheme is 
designed for each class/cluster of signal, which is best suited to the host signal. 

The training process need only be performed once for a category of audio signals. The 
derived classification parameters and the embedding schemes are used to embed 
watermarks in all audio signals in that category. 

15 With reference to the audio classification and embedding scheme selection module 
170 of Fig. 1, similar pre-processing is conducted to convert the incoming audio 
signal into feature frame sequences. Each frame is classified into one of the 
predefined classes. An embedding scheme for a frame is chosen, which is referred to 
as the content-adaptive embedding scheme. In this way, the watermark code is 

20 embedded frame-by-frame into the host digital audio signal. 

An exemplary process of audio embedding modelling is set forth in detail in 
Appendix 3. 

4 Watermark Extracting 

Fig. 4 illustrates a process of watermark extraction. A watermarked audio signal 1 10 
25 is optionally provided as input to an audio registration module 460. This module 460 
is a preferred feature of the embodiment shown in Fig. 4. However, this aspect need 
not be practised. The module 460 pre-processes the watermark audio signal 1 10 in 
relation to the original audio signal 100. This is done to protect the watermarked 
audio signal 1 10 from distortions. This is described in greater detail hereinafter. 
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The watermarked audio signal 1 10 is then provided as input to the audio segmentation 
module 400. This module 400 segments the watermark audio signal 1 10 into frames. 
That is, the (registered) watermarked audio signal is then segmented into frames usmg 
the same segmentation method as in the embedding process of Fig. 1 . The output of 
5 this module 410 is provided as input to the echo-detecting module 410. 

The echo-detecting module detects any echoes present in the currently processed 
audio frame. Echo detection is applied to extract echo delays on a frame-by-frame 
basis. Because a single bit of the watermark is hopped into multiple echoes through 
echo hopping in the embedding process of Fig. 1, multiple delays are detected in each 
10 frame. This method is more robust against attacks compared with a single-echo 
hiding technique. Firstly, one frame is encoded with multiple echoes, and any 
attackers do not know the coding scheme. Secondly, the echo signal is weaker and- 
well hidden as a consequence of using multiple echoes. 

The detected echoes determined by module 410 are provided as input to the code- 
15 mapping module 420. This module 420 also receives as input the embedding 

schemes 104 and produces the encrypted watermark, which is provided as output to 
the decryption module 430. This module performs the inverse operation of step 160 
in Fig. 1. 

The decryption module 430 also receives as input the watermark key 108. The 
20 extracted codes must be decrypted using the watermark key to recover the actual 

watermark. The output of the decryption 430 is provided to the watermark recovering 
module 440, which produces the original watermark 450 as it output. A message is 
produced from the binary sequence. The watermark 450 corresponds to the 
watermark information 102 of Fig. 1. 

25 4,1 Echo Detecting 

Fig. 5 is a detailed flowchart illustrating the echo detecting process of Fig. 4. The key 
step involves detecting the spacing between the echoes. To do this, the magnitude (at 
relevant locations in each audio frame) of an autocorrelation of an embedded signal's 
cepstrum is examined. Processing commences in step 500. In step 500, a watermark 
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audio frame is converted into the frequency domain. In step 510, the complex 
logarithm (i.e., log (a + bj)) is calculated. In step 520, the inverse fast Fourier 
transform (IFFT) is computed. 

In step 530, the autocorrelation is calculated. Cepstral analysis utilises a form of 
5 homomorphic system that coverts a convolution operation into addition operations. It 
is useful in detecting the existence of echoes. From the autocorrelation of the 
cepstrum, the echoes in each audio frame can be found according to a "power spike" 
at each delay of the echoes. Thus, in step 540, a time delay corresponding to "power 
spike" is searched for. In step 550, a code corresponding to the delays is determined. 
10 Processing then terminates. An exemplary echo detecting process is set forth in detail 
in Appendix 2. 

5 Audio Registration 

Fig, 9 illustrates the audio registration process of Fig. 4 that is performed before 
watermark detection. Audio registration is a pre-processing technique to recover a 

1 5 signal from potential attacks, such as insertion or deletion of a frame, re-scaling in the 
time domain. A watemiarked audio signal 900 and an original signal 902 are 
provided as input. In step 910, the two input signals, 900, 902 are segmented and a 
fast Fourier transfomi (FFT) performed on each. In step 920, for each input signal, 
the power in each frame is calculated using the mel scale. In step 930, the best time 

20 alignment between the two frames is found using the dynamic time-warping 

procedure. Dynamic Time-Warping (DTW) technique is used to register the audio 
signals by comparing the watermarked signal with the original signal. This procedure 
is set forth in detail in Appendix 4. In step 940, an audio registration is made 
accordingly. Processing then terminates. 

25 In the foregoing manner, a method, apparatus, and computer program product for 

embedding a watermark in a digital audio signal are disclosed. Also a corresponding 
method, apparatus, and computer program product for extracting a watermark from a 
watermarked audio signal are disclosed. Only a small number of embodiments are 
described. However, it will be apparent to one skilled in the art in view of this 
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disclosure that numerous changes and/or modifications can be made without departing 
from the scope and spirit of the invention. 
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APPENDIX 1 



A Feature Extraction Method Using Mel Scale Analysis 

An audio signal is first segmented into frames. Spectral analysis is applied to each 
5 fi"ame to extract features fi-om the position of the signal for further processing. The 
mel scale analysis is employed as an example. 

Psychophysical studies have shown that human perception of the frequency content of 
sounds, either for pure tones or for music signals, does not follow a linear scale. 
There are many non-linear frequency scales that approximate the sensitivity of the 
10 human ear. The mel scale is widely used because it has a simple analytical form: 

w = 1 125 ln(0.0016/ + 1) / > 1000 //z , (4) 

where / is the fi-equency in Hz and m is the mel scaled fi*equency. For / < \ OOOHz , 
the scale is linear. 

An example procedure of feature extraction is as follows: 
15 (1) Segment the audio signal into m fixed-length fi-ames; 

(2) For each audio frame Sj{n) , a Fast Fourier Transform (FFT) is applied: 

5,0-^) = F(^,(n)); (5) 

(3) Define a frequency band in the spectmm: 

-/"mix » y"min ' 

20 (4) Determine the charmel number and , where n, for / < IkHz and ^2 for 

/ > IkHz ; 

(5) For / < IkHz , calculate the bandwidth of each band: 
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l,.'"^'^-"; (6) 



(6) For / < IkHz , calculate the center frequency of each band: 

L-ib^f^, ; (7) 

(7) For / > \kHz , calculate the maximum and minimum mel scale frequency: 



15 



(8) 



=ll251n(0.0016/_+l) 
m^^ -11251n(0.0016x 1000 + 1)' 

(8) For / > \kHz , calculate the mel scale frequency interval of each band: 



^^^^^ — . 

(9) For / > XkHz , calculate the center frequency of each band: 

/- =(exp((/z5^ + 1000)/1125)-l)/0.OO16; (10) 
10 (10) For / > XkHz , calculate the bandwidth of each band: 

b,=fu^-'L\ (11) 

(II) For each center frequency and bandwidth, determine a triangle window 
function such as that shown in Fig. 10, 



w = < 



(12) 



[f.-fr fc-fr 
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where f^, fi.fr ^xe the center frequency, minimum frequency and maximum 
frequency of each band; 

(12) For each band, calculate its spectral power: 

= S w,., , (13) 

where s j is the spectrum of each frequency band.; 

(13) For bands satisfying < 1000//z , calculate their power summation: 

Pf^MOi.- I /^/;and (14) 

(14) For bands satisfying f^ > 1000//z , calculate their power summation: 

f>\kHz 



10 
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APPENDIX 2 
An Echo Detection Method Using Cepstral Analysis 

This process involves the following steps: 

(1) For each audio frame 5- (n) , calculate the Fourier transformation: 

5 = ; (16) 

(2) Take the complex Logarithm of 5, (e^^) : 

log5,.(^^")-logF(5, (17) 

(3) Take the inverse Fourier transformation (cepstrum): 

s,{n) = F''ilogF(s,(n)))- (18) 
10 (4) Take the autocorrelation of the cepstrum: 

Rni^)= ts(n^m)nm)\ (19) 

(5) Search the time point {S^ ) corresponding to a ''power spike" of Rjj{n) ; and 

(6) Determme the code corresponding to S- . 
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APPENDIX 3 



An Example of Content-Sensitive Watermarking Modelling 

1. Audio Clustering and Embedding Scheme Design 

5 Suppose that there are only a limited number of audio signal classes in the 

frequency space. Given a set of sample data, or training data, audio clustering 
trains up a model to describe the classes. By observing the resulting clusters, 
embedding schemes can be established according to the their spectral 
characteristics as follows: 

10 (1) Segment audio signal into m fixed-length frames; 

(2) For each frame, extract the features using mel scale analysis: 

y -{KA^ --^^n.)\ (20) 

(3) Select four feature vectors in the vector space randomly and use them as the 
initial centroids of the four classes: 

15 C = (^,,(?3,(^,}; (21) 

(4) Classify the sample frames into the four partitions in the feature space using 
the nearest neighbour rule; 

For 7=1 to 4, /=1 to m 

Pi c class(y ) if mm | - (^^ | 

20 (5) Re-estimate the new centroids for each class: 



Class(7) = {^^^\h'\- Jlf'} = -1-Z(^<^> a, = Jj- J | | , (22) 
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where y=l,2,3,4 and = ^ i 

(6) Steps (4) and (5) are iterated until a convergence criterion is satisfied; 

(7) Establish an embedding table for bit zero and bit one according to the HAS 
model for each class. Time delay and energy are the major parameters; 

5 Class 1 : C > ^'o? > ^li' . ^'o\' ^ (zero bit), S^^' , ^<;> , S^^' , S^^ , a|'> (one bit) 

Class 2: S^\S^',\s',l\s',l\4'' (zero bit), ^,^o'^^^'^^/^^^^'^«P* (one bit) 

Class 3: S^,l\S^',^ .^^l^^^l^ (zero bit), Sl'^\sl^^ .S^^J >'^l^3^^^* (one bit) 

Class 4: c^io'*. '^^?^^^2^^^3^ (zero bit), (one bit) 

a represents the energy and 5 is the delay; 

10 In addition, the number of echoes to embed is also decided by comparing two 

power summations: 

If Pf<,xkHz - ^Pf>\kHz ' ^hen embed one echo in this frame: 
Embedding parameters: {a'^' , S'^ , S'J^ ), (af ^ , S\'^' , S{\' ) , 

1 5 If Pf^xkHz ^ Pf<,\kHz < ^Pf>ikHz ' then embed two echoes in this frame: 

embedding parameters: (a<'^ , .^/i* 

Pf<\kHz - Pf>\kHz ^ '^Pf^xkHz » then embed three echoes in this frame: 
embedding parameters: {a^^ .Sg^ .S^^,\sg^Ua\'^ , J*i> ^^ii) ; 
If Pf^xkHz - ^Pf^xkHz then embed four echoes in this frame: 
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embedding parameters: .S^l\s',\^ ^Sg' ,^^i>).(af'> j/p 
2, Audio classification and Embedding Scheme Selection 

5 (1) Segment the audio signal into m fixed-length frames; 

(2) Classify a frame 5, into one of the four classes by nearest neighbour rule: 

SieClassiJ) if min | - C,- | / = 1,2,- • -,m;y = 1,2,3,4 ; 

(3) Select an embedding scheme for each frame in the embedding parameters table 
according to its class identity and spectral analysis. 
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APPENDIX 4 

An Audio Registration Method Based on Dynamic Time Warping 

The DTW technique resolves an optimal alignment path between two audio signals. 
5 Both the audio signal under consideration and the reference audio signal are first 
segmented into fixed-length fi-ames, and then the power spectral parameters in each 
frame are calculated using the mel scale method. An optimal path is generated that 
gives the minimum dissimilarity between the reference audio and the tested audio 
frame sequences. The registration is performed according to this optimal path 
10 whereby any possible shifting, scaling, or other non-linear time domam distortion can 
be detected and recovered, 

(1) For the original audio s and the watermarked audio s' , segment them with the 
same fixed-length. Frames of s and s' can be expressed as 5, (/ = \,- ',m) and 

15 (2) Extract features of the original and watermarked signals; 

where / is the channel number of mel scales; 
(3) Find an optimal alignment path between the original and watermarked signals: 

20 

(a) Initialisation: 

Define local constraints and global path constraints; 

(b) Recursion: 
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For I < 1 < m , 1 < j <n such that / and j stay within the allowable grid, 
calculate 

D,j = min[ A-./ + ^(('' . / )■ j)) (23) 
where 

aO'./).(',y))= (24) 

with being the number of moves in the path from (/',/) to {ij) . 

i~L, =/\ j-L,=j- (25) 

d^j = jz(v,.-v'.,)- (26) 

(c) Termination: D„„ 

(d) Form an optimal path from (1,1) to {m,n) according to : 

P = {/7.-|/e[I,---,m],ye[l,---,Ai]} , (27) 

(4) Register the watermarked audio with the original audio according to the optimal 
path: 

For G P 

If i < y , add the ith frame of 5 to 5' ; 
If i> J , remove the jth frame from s' . 
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The claims defining the invention are as follows: 

1. A method of embedding a watermark in a digital audio signal, said 
method including the steps of 

embedding at least one echo dependent upon said watermark in a portion of 
5 said digital audio signal, predefined characteristics of said at least one echo being 
dependent upon time and/or frequency domain characteristics of said portion of said 
digital audio signal to provide a substantially inaudible and robust embedded 
watermark in said digital audio signal. 

2. The method according to claim 1, further including the step of 

10 digesting said digital audio signal to provide a watermark key, said watermark being 
dependent upon said watermark key. 

3. The method according to claim 2, further including the step of 
encrypting predetermined information using said watermark key to form said 
watermark. 

1 5 4. The method according to claim 1, further including the step of 

generating said at least one echo to have a delay and an amplitude relative to said 
digital audio signal that is substantially inaudible. 

5. The method according to claim 1, wherein the value of said delay and 
said amplitude are programmable. 

20 6. The method according to claim 1, wherein two or more echoes are 

programmably sequenced having different delays and/or amplitudes. 

7. The method according to claim 1, wherein two portions of said digital 
audio signal are embedded with different echoes dependent upon the time and/or 
frequency characteristics of said digital audio signal. 

25 8. An apparatus for embedding a watermark in a digital audio signal, said 

apparatus including: 
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means for determining time and/or frequency domain characteristics of said 
digital audio signal; 

means for embedding at least one echo dependent upon said watermark in a 
portion of said digital audio signal, predefined characteristics of said at least one echo 
5 being dependent upon said time and/or frequency domain characteristics of said 
portion of said digital audio signal to provide a substantially inaudible and robust 
embedded watermark in said digital audio signal. 

9. The apparatus according to claim 8, further including means for 
digesting said digital audio signal to provide a watermark key, said watermark being 

10 dependent upon said watermark key. 

10. The apparatus according to claim 9, further including means for 
encrypting predetermined information using said watermark key to form said 
watermark. 

1 1. The apparatus according to claim 8, further including means for 

1 5 generating said at least one echo to have a delay and an amplitude relative to said 
digital audio signal that is substantially inaudible. 

12. The apparatus according to claim 8, wherein the value of said delay 
and said amplitude are programmable. 

13. The apparatus according to claim 8, wherein two or more echoes are 
20 programmably sequenced having different delays and/or amplitudes. 

14. The apparatus according to claim 8, wherein two portions of said 
digital audio signal are embedded with different echoes dependent upon the time 
and/or frequency characteristics of said digital audio signal. 

15. A computer program product having a computer readable medium 
25 having a computer program recorded therein for embedding a watermark in a digital 

audio signal, said computer program product including: 
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means for determining time and/or frequency domain characteristics of said 
digital audio signal; 

means for embedding at least one echo dependent upon said watermark in a 
portion of said digital audio signal, predefined characteristics of said at least one echo 
5 being dependent upon said time and/or fi*equency domain characteristics of said 
portion of said digital audio signal to provide a substantially inaudible and robust 
embedded watermark in said digital audio signal. 

16. The computer program product according to claim 15, further 
including means for digesting said digital audio signal to provide a watermark key, 

10 said watermark being dependent upon said watermark key. 

17. The computer program product according to claim 16, further 
including means for encrypting predetermined information using said watermark key 
to form said watermark. 

18. The computer program product according to claim 15, further 
1 5 including means for generating said at least one echo to have a delay and an 

amplitude relative to said digital audio signal that is substantially inaudible. 

19. The computer program product according to claim 15, wherein the 
value of said delay and said amplitude are programmable. 

20. The computer program product according to claim 15, wherein two or 
20 more echoes are programmably sequenced having different delays and/or amplitudes. 

2 1 . The computer program product according to claim 15, wherein two 
portions of said digital audio signal are embedded with different echoes dependent 
upon the time and/or frequency characteristics of said digital audio signal. 

22. A method of extracting a watermark from a watermarked digital audio 
25 signal, said method including the steps of: 

detecting at least one echo embedded in a portion of said watermarked digital 
audio signal, predefined characteristics of said at least one echo being dependent upon 
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time and/or frequency domain characteristics of said portion of a corresponding 
original digital audio signal; and 

decoding said at least one detected echo recover said watermark. 

23. The method according to claim 22, further including the step of 
registering said watermarked digital audio signal with said original audio signal to 
recover from any distortions and/or modifications of said watermarked digital audio 
signal. 

24. The method according to claim 22, wherein said decoding step is 
dependent upon an embedding scheme, 

25. The method according to claim 22, further comprising the step of 
decrypting one or more codes produced by said decoding step dependent upon a 
digested digital audio signal. 

26. The method according to claim 22, wherein said at least one echo has a 
delay and an amplitude relative to said digital audio signal that is substantially 
inaudible. 

27. The method according to claim 26, wherein the value of said delay and 
said amplitude are programmable. 

28. The method according to claim 22, wherein two or more echoes are 
programmably sequenced having different delays and/or amplitudes. 

29.. The method according to claim 22, wherein two portions of said 
watermarked digital audio signal is embedded with different echoes dependent upon 
the time and/or frequency characteristics of said original digital audio signal. 

30. An apparatus for extracting a watermark from a watermarked digital 
audio signal, said apparatus including: 

means for detecting at least one echo embedded in a portion of said 
watermarked digital audio signal, predefined characteristics of said at least one echo 
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being dependent upon time and/or frequency domain characteristics of said portion of 
a corresponding original digital audio signal; and 

means for decoding said at least one detected echo recover said watermark. 

31 . The appeiratus according to claim 30, further means for registering said 
5 watermarked digital audio signal with said original audio signal to recover from any 

distortions and/or modifications of said watermarked digital audio signal. 

32. The apparatus according to claim 30, wherein said decoding means is 
dependent upon an embedding scheme. 

33. The apparatus according to claim 30, further comprising means for 
1 0 decrypting one or more codes produced by said decoding step dependent upon a 

digested digital audio signal. 

34. The apparatus according to claim 30, wherein said at least one echo 
has a delay and an amplitude relative to said digital audio signal that is substantially 
inaudible. 

1 5 35. The apparatus according to claim 34, wherein the value of said delay 

and said amplitude are programmable. 

36. The apparatus according to claim 30, wherein two or more echoes are 
programmably sequenced having different delays and/or ampUtudes. 

37. The apparatus according to claim 30, wherein two portions of said 
20 watermarked digital audio signal is embedded with different echoes dependent upon 

the time and/or frequency characteristics of said original digital audio signal. 

38. A computer program product having a computer readable medium 
having a computer program recorded therein for extracting a watermark from a 
watermarked digital audio signal, said computer program product including: 

25 means for detecting at least one echo embedded in a portion of said 

watermarked digital audio signal, predefined characteristics of said at least one echo 
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being dependent upon time and/or frequency domain characteristics of said portion of 
a corresponding original digital audio signal; and 

means for decoding said at least one detected echo recover said watermark. 

39. The computer program product according to claim 38, further means for 
5 registering said watermarked digital audio signal with said original audio signal to 

recover from any distortions and/or modifications of said watermarked digital audio 
signal. 

40. The computer program product according to claim 38, wherein said 
decoding means is dependent upon an embedding scheme. 

10 41. The computer program product according to claim 38, further comprising 

means for decrypting one or more codes produced by said decoding step dependent 
upon a digested digital audio signal. 

42. The computer program product according to claim 38, wherein said at 
least one echo has a delay and an amplitude relative to said digital audio signal that is 

15 substantially inaudible. 

43. The computer program product according to claim 42, wherein the 
value of said delay and said amplitude are programmable. 

44. The computer program product according to claim 38, wherein two or 
more echoes are programmably sequenced having different delays and/or amplitudes. 

20 45. The computer program product according to claim 38, wherein two 

portions of said watermarked digital audio signal is embedded with different echoes 
dependent upon the time and/or frequency characteristics of said original digital audio 
signal. 

46. A method of embedding a watermark in a digital audio signal, said 
25 method including the steps of: 

generating a digital watermark; 
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adaptively segmenting said digital audio signal dependent upon at least one 
frequency and/or time domain characteristic into two or more frames containing 
respective portions of said digital audio signal; 

classifying each frame dependent upon at least one frequency and/or time 
5 domain characteristic of said portion of said digital audio signal in said frame; and 

embedding at least one echo in at least one of said frames, said echo being 
dependent upon said watermark and upon a classification of each frame determined 
by said classifying step, whereby a watermarked digital audio signal is produced. 

1 0 47. The method according to claim 46, wherein said watermark is 

dependent upon said digital audio signal. 

48. The method according to claim 47, further including the steps of: 
audio digesting said digital audio signal to provide an audio digest; and 

1 5 encrypting watermark information dependent upon said audio digest. 

49. The method according to claim 46, fiirther including the step of 
extracting one or more features from each frame of said digital audio signal. 

20 50. The method according to claim 49, further including the step of 

selecting an embedding scheme for each frame dependent upon said classification of 
each frame, said embedding scheme adapted dependent upon at least one time and/or 
frequency domain characteristic of said classification for the corresponding portion of 
said digital audio signal. 
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5 1 . The method according to claim 50, further including the step of 
embedding said at least one echo in at least one of said fi-ames dependent upon the 
selected embedding scheme. 

5 

52. The method according to claim 51, wherein the amplitude and the 
delay of said echo relative to the corresponding portion of said digital audio signal in 
said frame is defined dependent upon the embedding scheme so as to be inaudible. 

10 53. The method according to claim 52, wherein at least two echoes are 

embedded m said frame. 

54. The method according to claim 46, wherein two or more echoes 
embedded in said digital audio signal are dependent upon a bit of said watermark. 

15 

55. An apparatus for embedding a watermark m a digital audio signal, said 
apparatus including: 

means for generating a digital watermark; 

means for adaptively segmenting said digital audio signal dependent upon at 
20 least one frequency and/or time domain characteristic into two or more frames 
containing respective portions of said digital audio signal; 

means for classifying each frame dependent upon at least one frequency 
and/or time domain characteristic of said portion of said digital audio signal in said 
frame; and 
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means for embedding at least one echo in at least one of said frames, said echo 
being dependent upon said watermark and upon a classification of each frame 
determined by said classifying means, whereby a watermarked digital audio signal is 
produced. 



56, The apparatus according to claim 55, wherein said watermark is 
dependent upon said digital audio signal. 



57. The apparatus according to claim 56, further includmg: 

1 0 means for audio digesting said digital audio signal to provide an audio digest; 

and 



means for encrypting watermark information dependent upon said audio 



digest. 



^ 5 58. The apparatus according to claim 55, further including means for 

extracting one or more features from each frame of said digital audio signal. 



59. The apparatus according to claim 58, further including means for 
selecting an embedding scheme for each frame dependent upon said classification of 
each frame, said embedding scheme adapted dependent upon at least one time and/or 
frequency domain characteristic of said classification for the corresponding portion of 
said digital audio signal. 
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60. The apparatus according to claim 59, further including means for 
embedding said at least one echo in at least one of said frames dependent upon the 
selected embedding scheme. 

5 61. The apparatus according to claim 60, wherein the amplitude and the 

delay of said echo relative to the corresponding portion of said digital audio signal in 
said frame is defined dependent upon the embedding scheme so as to be inaudible. 

62. The apparatus according to claim 61, wherein at least two echoes are 
10 embedded in said frame. 

63. The apparatus according to claim 55, wherein two or more echoes 
embedded in said digital audio signal are dependent upon a bit of said watermark. 

1 5 64. A computer program product having a computer readable medium 

having a computer program recorded therein for embedding a watermark in a digital 
audio signal, said computer program product including: 

means for generating a digital watermark; 

means for adaptively segmenting said digital audio signal dependent upon at 
20 least one frequency and/or time domain characteristic into two or more frames 
containing respective portions of said digital audio signal; 

means for classifying each frame dependent upon at least one frequency 
and/or time domain characteristic of said portion of said digital audio signal in said 
frame; and 
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means for embedding at least one echo in at least one of said frames, said echo 
being dependent upon said watermark and upon a classification of each frame 
determined by said classifying means, whereby a watermarked digital audio signal is 
produced. 

65. The computer program product according to claim 64, wherein said 
watermark is dependent upon said digital audio signal. 



66. The computer program product according to claim 65, further 
10 including: 

means for audio digesting said digital audio signal to provide an audio digest; 

and 

means for encrypting watermark information dependent upon said audio 

digest. 

15 

67. The computer program product according to claim 64, further 
including means for extracting one or more features from each frame of said digital 
audio signal. 



20 68. The computer program product according to claim 67, further 

including means for selecting an embedding scheme for each frame dependent upon 
said classification of each frame, said embedding scheme adapted dependent upon at 
least one time and/or frequency domain characteristic of said classification for the 
corresponding portion of said digital audio signal. 

25 
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69. The computer program product according to claim 68, further 
including means for embedding said at least one echo in at least one of said frarnes 
dependent upon the selected embedding scheme. 

5 70. The computer program product according to claim 69, wherein the 

ampHtude and the delay of said echo relative to the corresponding portion of said 
digital audio signal in said frame is defined dependent upon the embedding scheme so 
as to be inaudible. 

10 71 . The computer program product according to claim 70, wherein at least 

two echoes are embedded in said frame. 

72. The computer program product according to claim 64, wherein two or 
more echoes embedded in said digital audio signal are dependent upon a bit of said 

1 5 watermark. 

73. A method of extracting a watermark from a watermarked digital audio 
signal, said method including the steps of: 

adaptively segmenting said watermarked digital audio signal into two or more 
20 frames containing corresponding portions of said watermarked digital audio signal; 

detecting at least one echo present in said frames; and 

code mapping said at least one detected echo to extract an embedded watermark, said 
mapping being dependent upon one or more embedding schemes used to embed said 
at least one echo in said watermarked digital audio signal. 

25 
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10 



20 



74. The method according to claim 73, further including the step of audio 
registering said watermarked digital audio signal with said original digital audio 
signal to determine any unauthorised modifications of said watermarked digital audio 
signal 



75. The method according to claim 73, further including the step of 
decrypting said embedded watermark dependent upon an audio digest signal to derive 
watermark information, said audio digest signal being dependent upon an original 
digital audio signal. 



76. An apparatus for extracting a watermark from a watermarked digital 
audio signal, said apparatus including: 

means for adaptively segmenting said watermarked digital audio signal into 
two or more frames containing corresponding portions of said watermarked digital 
15 audio signal; 

means for detecting at least one echo presentun said frames; and 

means for code mapping said at least one detected echo to extract an embedded 
watermark, said mapping being dependent upon one or more embedding schemes 
used to embed said at least one echo in said watermarked digital audio signal. 



77. The apparatus according to claim 76, further including means for audio 
registering said watermarked digital audio signal with said original digital audio 
signal to determine any imauthorised modifications of said watermarked digital audio 
signal. 
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78. The apparatus according to claim 76, further including means for 
decrypting said embedded watermark dependent upon an audio digest signal to derive 
watermark information, said audio digest signal being dependent upon an original 
digital audio signal. 



79. A computer program product having a computer readable medium 
having a computer program recorded therein for extracting a watermark from a 
watermarked digital audio signal, said computer program product including: 

means for adaptively segmenting said watermarked digital audio signal into 
10 two or more frames containing corresponding portions of said watermarked digital 
audio signal; 

means for detecting at least one echo present in said frames; and 

means for code mapping said at least one detected echo to extract an embedded 
watermark, said mapping being dependent upon one or more embedding schemes 
15 used to embed said at least one echo in said watermarked digital audio signal. 



80. The computer program product according to claim 79, further 
including means for audio registering said watermarked digital audio signal with said 
original digital audio signal to determine any unauthorised modifications of said 
20 watermarked digital audio signal. 



81 . The computer program product according to claim 79, further 
including means for decrypting said embedded watermark dependent upon an audio 
digest signal to derive watermark information, said audio digest signal being 
25 dependent upon an original digital audio signal. 



NSDOCID: <WO 00399S5A1_I_> 



wo 00/39955 



PCT/SG98/00111 



-1/10- 




BNSDOCID: <WO 



0039955A1 I > 



wo 00/39955 



PCT/SG98/00I11 



-2/10- 



DIVIDE EACH FRAME 
INTO MULTIPLE SUB- 
FRAMES 



T 



200 



MAP EMBEDDING j/^V/ 
SCHEME IN SUB-FRAMES 



I 



210 



CODE SUB-FRAMES 
ACCORDING TO EMBEDDING 
SCHEMES 



220 



FIG. 2 



( SUB- 
\FRAME / ' 

T 



CALCULATE THE DELAY OF 
ORIGINAL SIGNAL 



ADD THE DELAY TO THE 
ORIGINAL SIGNAL 



310 



320 



FIG. 3 



MSCOCID; <WO, 



.0039955A1_L> 



wo 00/39955 



PCT/SG98/00I11 



.3/10- 




BNSOOCID: <WO 0039955A1_L> 



wo 00/39955 



PCT/SG98/00111 



-4/10- 



CONVERT WATERMARKED AUDIO 
FRAMES INTO FREQUENCY DOMAIN 
USING FFT 



500 



CALCULATE THE 
COMPLEX LOGARITHM 








520 

/A/ 




TAKE INVERSE FFT 











510 



CALCULATE THE 
AUTO-CORRELATION 



.530 



SEARCH TIME DELAY 
CORRESPONDING TO 
"POWER SPIKES" 



540 



DETERMINE CODE 
ACCORDING TO K\/ 
DELAYS 



550 



FIG. 5 



--NSDOCID: <WO 0039955A1J_ 



wo 00/39955 



PCT/SG98/001 1 1 



-5/10- 

630 




FIG. 6 




700 



710 



HASH TRASFORM 



I 



GENERATE 
WATERMARK 
KEY 



720 



FIG. 7 



SNSDOCID: <WO 0039955A1_L> 



wo 00/39955 



PCT/SG98/00111 



.6/10- 




WSCXDCID: <WO. 



.0039955A1_I_> 



wo 00/39955 



PCT/SG98/00111 




902 



910 



CALCULATE THE POWER IN 
EACH FRAME USING MEL 
SCALE 



920 



FIND THE BEST TIME 
ALIGNMENT USING 
DTW 



930 



PERFORM AUDIO 
REGISTRATION 



940 



FIG. 9 



SnSOOCID; <W0 0039955A1 J_> 



wo 00/39955 



PCT/SG98/00111 



-8/10- 




0(/;) /. 



FIG. 10 



Portion 1100 of 
Original Audio 
Signal S[co] 



Frequency co 

FIG. 12 



3NSDOCID: <WO 0039955A1 



wo 00/39955 



PCT/SG98/001 1 1 



Original Audio 
Signal Sfnl 



-9/10- 



1 -- 



0 -- 




TIME 



FIG. IIA 



Echo S'[n] 
1" Sub-Frame 



nO nO+51 



1110 




nl 



FIG. 11 B 



3NS0OCID: <WO_ 



_O0a9955AlJ_> 



wo 00/39955 



PCT/SG98/00111 



A EchoS"[n] 



nl nH-52 



■10/10- 




FIG. lie 



TIME 



n2 



Frame 1 100 of Watermarked Audio 
Signal Srnl + Echos STn] + S'Tnl 

1100 



1 -- 



0 



-1 -- 





nO 



TIME 



FIG. 1 ID 



00399S5A1 I > 



1 



INTERNATIONAL SEARCH REPORT 



Intemationai application No. 

PCT/SG 98/001 II 



A. CLASSIFICATION OF SUBJECT MATTER 
FPC^: H 04 L 9/00 

According to Intemationai Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 

IPC^: H 04 L, H 04 N 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
WPI 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


US 5 822 532 A (MOSKOWITZ et al.) 13 October 1998 (13.10.98), 
abstract; column 1, line 6 to column 6, line 13; fig. 1,2. 


1,2,8,10,15-17, 
30,38,46,55,57, 
64,73,76,79 


A 


EP 0 766 468 A2 (NEC CORPORATION) 02 April 1997 (02.04.97), 
abstract; column 1, line 3 to column 4, line 5; column 7, line 32 to 
column 13, line 15; fig. 1-3 


1,8,15,30,38,46, 
55,64,73,76,79 


A 


EP 0 651 554 Al (EASTMAN KODAK COMPANY) 03 May 1985 
(03.05.85), abstract; column 3, line 18 to column 4, line 47; column 6, 
line 43 to column 9, line 19; fig. 1-4. 


1,3,8,10,15,22, 
38,46,55,57,64, 
73,79 



□ Further documents are listed in the continuation of Box C 



DKI See patent family annex. 



* Special categories of cited documents: 

..A" document defining the general state of the an which is not 

considered to be of particular relevance 
..E" earlier application or patent but published on or after the international 

filing date 

..L ■ document which may throw doubts on priority claim(s) or which is 
cited to establish the publication date of another citation or other 
special reason (as specified) 

.,0" document referring to an oral disclosure, use, exhibition or other 
means 

..P" document published prior to the international filing date but later than 



„T" later document published after the international filing date or priority 

date and not in conflict with the application but cited to understand 

the principle or theory underlying the invention 
„X" document of particular relevance; the claimed invention cannot be 

considered novel or cannot be considered to involve an inventive step 

when the document is taken alone 
,.Y" document of particular relevance; the claimed invention cannot be 

considered to involve an inventive step when the document is 

combined with one or more other such documents, such combination 

being obvious to a person skilled in the art 
,.&'* document member of the same patent family 



Date of the actual completion of the international search 

29 October 1999 (29.10.99) 


Date of mailing of the intemationai search repon 

12 November 1999(12.11.99) 


Name and mailing adress of the ISA/AT 

Austrian Patent Office 
Kohlmarkt 8-10; A-1014 Vienna 

Facsimile No. 1/53424/200 


Authorized officer 

Hajos 

Telephone No. 1/53424/410 



Form PCT/IS A/2 10 (second sheet) (July 1998) 



BNSDOCID: <WO 0O39955A1_l_> 



INTERNATIONAL SEARCH REPORT 

Information on patent family members 



International application No. 

PCT/SG 98/00111 



Its Recherchenbericht 
angefiihrtes Patentdokuwent 
Patent document cited 
in search n^rt 
Docuoent de brevet cit^ 
d^ns le rapport de recherthe 



5322432 



OatuA der 
^dHentlichung 
Publication 
date 
Date de 
ts^sAvlication 



Mitalied(er) der 
Palentfaailie 
Patent family 

We(Bbre(sJ de la 
^ditiHe ^ Swwets 



Datum der 
Veroffentiichung 
Publication 

date 
Date de 



EF' Al 



1 3- 1. 0-199e 



02-04" 1997 



03-05- :i995 



AU A:!, 
WG A. J 
US A 



AU 
AU 
CA 
EP 
JP 



A3. 
P2 
AA 
A3 
A2 
A 



JP a:; 



9726733 
59 05 BOO 

63S40/9t> 

2184949 
766 46S 
91 9 i 7:94 
5930369 

7212712 



1 1-08-1997 
:7A-07--f v^97 
1 B- 05- 1999 

1 0- 04" 1997 
04 — 02— J vQO 

9 — C) 3— f ■ 9 9 y 
''07-09-- 1 4'f^7'9 
22-07-'j ^9^ 
27- 07"" 199"V 

1 1 - OB-19' 



Form PCT/ISA/2I0 (patent family annex) (July 1998) 



_0039955A1_( > 



